Model Selection Under Covariate Shift
نویسندگان
چکیده
A common assumption in supervised learning is that the training and test input points follow the same probability distribution. However, this assumption is not fulfilled, e.g., in interpolation, extrapolation, or active learning scenarios. The violation of this assumption— known as the covariate shift—causes a heavy bias in standard generalization error estimation schemes such as cross-validation and thus they result in poor model selection. In this paper, we therefore propose an alternative estimator of the generalization error. Under covariate shift, the proposed generalization error estimator is unbiased if the learning target function is included in the model at hand and it is asymptotically unbiased in general. Experimental results show that model selection with the proposed generalization error estimator is compared favorably to crossvalidation in extrapolation.
منابع مشابه
Covariate Shift Adaptation by Importance Weighted Cross Validation
A common assumption in supervised learning is that the input points in the training set follow the same probability distribution as the input points that will be given in the future test phase. However, this assumption is not satisfied, for example, when the outside of the training region is extrapolated. The situation where the training input points and test input points follow different distr...
متن کاملSelection Bias Correction in Supervised Learning with Importance Weight. (L'apprentissage des modèles graphiques probabilistes et la correction de biais sélection)
In the theory of supervised learning, the identical assumption, i.e. the training and the test samples are drawn from the same probability distribution, plays a crucial role. Unfortunately, this essential assumption is often violated in the presence of selection bias. Under such condition, the standard supervised learning frameworks may suffer a significant bias. In this thesis, we use the impo...
متن کاملRobust Covariate Shift Regression
In many learning settings, the source data available to train a regression model differs from the target data it encounters when making predictions due to input distribution shift. Appropriately dealing with this situation remains an important challenge. Existing methods attempt to “reweight” the source data samples to better represent the target domain, but this introduces strong inductive bia...
متن کاملInput-Dependent Estimation of Generalization Error under Covariate Shift
A common assumption in supervised learning is that the training and test input points follow the same probability distribution. However, this assumption is not fulfilled, e.g., in interpolation, extrapolation, active learning, or classification with imbalanced data. The violation of this assumption—known as the covariate shift— causes a heavy bias in standard generalization error estimation sch...
متن کاملDirect Importance Estimation with Model Selection and Its Application to Covariate Shift Adaptation
When training and test samples follow different input distributions (i.e., the situation called covariate shift), the maximum likelihood estimator is known to lose its consistency. For regaining consistency, the log-likelihood terms need to be weighted according to the importance (i.e., the ratio of test and training input densities). Thus, accurately estimating the importance is one of the key...
متن کامل